Scenario Discovery

The model consists of multiple external factors. These factors combined can result in either desirable or disadvantageous outcomes for the policymakers. Therefore it is of great importance to get more insight into the effects of these external factors on possible policies. That is why a scenario discovery analysis is performed. The method Patient Rule Induction Method, or PRIM, is used to categorize the scenarios and determine which ones best predicts desired outcomes given the inputs of the model. The data from the open exploration stage is used for this.

In [1]:
# Import Libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

from ema_workbench.analysis import prim
from ema_workbench import ema_logging
ema_logging.log_to_stderr(ema_logging.INFO)
C:\Users\salma\miniconda3\envs\gds\lib\site-packages\ema_workbench\analysis\prim.py:30: ImportWarning: altair based interactive inspection not available
  warnings.warn(
In [2]:
# Open file with outcomes

outcomes = pd.read_csv('./open_exploration/outcomes_3.csv', index_col=None)
outcomes.head()
Out[2]:
Unnamed: 0 A.1 Total Costs A.1_Expected Number of Deaths A.2 Total Costs A.2_Expected Number of Deaths A.3 Total Costs A.3_Expected Number of Deaths A.4 Total Costs A.4_Expected Number of Deaths A.5 Total Costs A.5_Expected Number of Deaths RfR Total Costs Expected Evacuation Costs
0 0 2.018911e+08 0.0 1.449938e+08 0.000547 1.034037e+08 0.00000 4.961730e+07 0.0 1.420865e+08 0.0 916200000.0 251.360364
1 1 2.018911e+08 0.0 1.473542e+08 0.000540 1.034037e+08 0.00000 4.961730e+07 0.0 1.420865e+08 0.0 916200000.0 250.206239
2 2 2.018911e+08 0.0 1.458849e+08 0.000528 1.034037e+08 0.00000 4.961730e+07 0.0 1.420865e+08 0.0 916200000.0 248.209112
3 3 2.018911e+08 0.0 1.442790e+08 0.000538 1.979968e+08 0.02345 4.961730e+07 0.0 1.420865e+08 0.0 916200000.0 5871.138688
4 4 2.018911e+08 0.0 2.107146e+08 0.007240 1.034037e+08 0.00000 4.961730e+07 0.0 1.420865e+08 0.0 916200000.0 3771.253650
In [3]:
# Open file with experiments

experiments = pd.read_csv('./open_exploration/experiments_3.csv', index_col=None)
experiments.head()
Out[3]:
Unnamed: 0 A.0_ID flood wave shape A.1_Bmax A.1_Brate A.1_pfail A.2_Bmax A.2_Brate A.2_pfail A.3_Bmax A.3_Brate ... A.4_DikeIncrease 0 A.4_DikeIncrease 1 A.4_DikeIncrease 2 A.5_DikeIncrease 0 A.5_DikeIncrease 1 A.5_DikeIncrease 2 EWS_DaysToThreat scenario policy model
0 0 3 119.122289 1.5 0.591532 98.631919 1.5 0.685550 164.891327 1.0 ... 3 7 9 8 4 4 3 2600 2500 dikesnet
1 1 11 346.165771 1.5 0.302154 246.604741 1.5 0.496800 37.053241 1.0 ... 3 7 9 8 4 4 3 2601 2500 dikesnet
2 2 94 199.130213 1.5 0.425568 294.793947 1.0 0.589527 236.443753 10.0 ... 3 7 9 8 4 4 3 2602 2500 dikesnet
3 3 55 219.662036 1.5 0.178258 318.247471 1.0 0.601927 110.599975 1.5 ... 3 7 9 8 4 4 3 2603 2500 dikesnet
4 4 97 324.644018 10.0 0.326461 224.554061 1.0 0.107261 313.646632 1.0 ... 3 7 9 8 4 4 3 2604 2500 dikesnet

5 rows × 54 columns

In [4]:
# Create a new column that combines all total costs/total deaths from all locations

outcomes['Total Costs'] = outcomes['A.1 Total Costs'] + outcomes['A.2 Total Costs'] + outcomes['A.3 Total Costs'] + outcomes['A.4 Total Costs'] + outcomes['A.5 Total Costs']
outcomes['Total Deaths'] = outcomes['A.1_Expected Number of Deaths'] + outcomes['A.2_Expected Number of Deaths'] + outcomes['A.3_Expected Number of Deaths'] + outcomes['A.4_Expected Number of Deaths'] + outcomes['A.5_Expected Number of Deaths']
In [5]:
outcomes.head()
Out[5]:
Unnamed: 0 A.1 Total Costs A.1_Expected Number of Deaths A.2 Total Costs A.2_Expected Number of Deaths A.3 Total Costs A.3_Expected Number of Deaths A.4 Total Costs A.4_Expected Number of Deaths A.5 Total Costs A.5_Expected Number of Deaths RfR Total Costs Expected Evacuation Costs Total Costs Total Deaths
0 0 2.018911e+08 0.0 1.449938e+08 0.000547 1.034037e+08 0.00000 4.961730e+07 0.0 1.420865e+08 0.0 916200000.0 251.360364 6.419924e+08 0.000547
1 1 2.018911e+08 0.0 1.473542e+08 0.000540 1.034037e+08 0.00000 4.961730e+07 0.0 1.420865e+08 0.0 916200000.0 250.206239 6.443528e+08 0.000540
2 2 2.018911e+08 0.0 1.458849e+08 0.000528 1.034037e+08 0.00000 4.961730e+07 0.0 1.420865e+08 0.0 916200000.0 248.209112 6.428835e+08 0.000528
3 3 2.018911e+08 0.0 1.442790e+08 0.000538 1.979968e+08 0.02345 4.961730e+07 0.0 1.420865e+08 0.0 916200000.0 5871.138688 7.358706e+08 0.023988
4 4 2.018911e+08 0.0 2.107146e+08 0.007240 1.034037e+08 0.00000 4.961730e+07 0.0 1.420865e+08 0.0 916200000.0 3771.253650 7.077133e+08 0.007240
In [6]:
experiments.columns
Out[6]:
Index(['Unnamed: 0', 'A.0_ID flood wave shape', 'A.1_Bmax', 'A.1_Brate',
       'A.1_pfail', 'A.2_Bmax', 'A.2_Brate', 'A.2_pfail', 'A.3_Bmax',
       'A.3_Brate', 'A.3_pfail', 'A.4_Bmax', 'A.4_Brate', 'A.4_pfail',
       'A.5_Bmax', 'A.5_Brate', 'A.5_pfail', 'discount rate 0',
       'discount rate 1', 'discount rate 2', '0_RfR 0', '0_RfR 1', '0_RfR 2',
       '1_RfR 0', '1_RfR 1', '1_RfR 2', '2_RfR 0', '2_RfR 1', '2_RfR 2',
       '3_RfR 0', '3_RfR 1', '3_RfR 2', '4_RfR 0', '4_RfR 1', '4_RfR 2',
       'A.1_DikeIncrease 0', 'A.1_DikeIncrease 1', 'A.1_DikeIncrease 2',
       'A.2_DikeIncrease 0', 'A.2_DikeIncrease 1', 'A.2_DikeIncrease 2',
       'A.3_DikeIncrease 0', 'A.3_DikeIncrease 1', 'A.3_DikeIncrease 2',
       'A.4_DikeIncrease 0', 'A.4_DikeIncrease 1', 'A.4_DikeIncrease 2',
       'A.5_DikeIncrease 0', 'A.5_DikeIncrease 1', 'A.5_DikeIncrease 2',
       'EWS_DaysToThreat', 'scenario', 'policy', 'model'],
      dtype='object')
In [9]:
# Looking for thresholds that will be used for PRIM
# Thresholds are based on wishes of our client: Rijkswaterstaat

plt.figure(figsize=(12, 10))
plt.suptitle('Percentage of values below the selected thresholds', fontsize=14)

# Subplot RfR Costs
plt.subplot(321)
outcomes['RfR Total Costs'].hist(edgecolor='black', alpha = 1)
# Dashed line symolising threshold
plt.vlines(1.25e9, 0, 10000, colors = "r", linestyles = "dashed")
# Calculating percentage under threshold
percentage_rfr = (np.sum(outcomes['RfR Total Costs']< 1.25e9) / len(outcomes['RfR Total Costs']))* 100
plt.title('Total RfR Costs: '+ str(percentage_rfr)+ '%')

# Subplot Expected Evacuation Costs
plt.subplot(322)
# Dashed line symolising threshold
outcomes['Expected Evacuation Costs'].hist(edgecolor='black', alpha = 1)
plt.vlines(50000, 0, 40000, colors = "r", linestyles = "dashed")
# Calculating percentage under threshold
percentage_eec = (np.sum(outcomes['Expected Evacuation Costs']<= 50000)/len(outcomes['Expected Evacuation Costs']))* 100
plt.title('Total Expected Evacuation Costs: '+ str(percentage_eec)+ '%')

# Subplot Total Deaths
plt.subplot(323)
outcomes['Total Deaths'].hist(edgecolor='black', alpha = 1)
# Dashed line symolising threshold
plt.vlines(0.005, 0, 40000, colors = "r", linestyles = "dashed")
# Calculating percentage under threshold
percentage_td = (np.sum(outcomes['Total Deaths']< 0.005)/len(outcomes['Total Deaths']))* 100
plt.title('Total Deaths: '+ str(percentage_td)+ '%')

# Subplot Total Costs
plt.subplot(325)
outcomes['Total Costs'].hist(edgecolor='black', alpha = 1)
# Dashed line symolising threshold
plt.vlines(0.75e9, 0, 30000, colors = "r", linestyles = "dashed")
# Calculating percentage under threshold
percentage_tc = (np.sum(outcomes['Total Costs']< 0.75e9)/len(outcomes['Total Costs']))* 100
plt.title('Total Costs: '+ str(percentage_tc)+ '%')

plt.show()
In [11]:
# Set thresholds as input for PRIM
# Thresholds are based on wishes of Rijkswaterstaat

x = experiments.iloc[:, 1:-4]
y_death = outcomes['Total Deaths'] < 0.005
y_totalcosts = outcomes['Total Costs'] < 0.75e9
y_rfrcosts = outcomes['RfR Total Costs'] < 1.25e9
y_evaccosts = outcomes['Expected Evacuation Costs'] < 50000

PRIM

PRIM subspace partitioning will be used for the following outcomes: expected deaths, total costs, RfR costs and evacuation costs. The base case data will also be used with PRIM to best compare the results and effects of uncertainty.

Lastly, worst scenarios will also be determines using PRIM.

With Policies

Expected Deaths

In [75]:
# Perform PRIM algorithm using a 0.8 threshold. This is the density threshold that a box has to meet.

prim_alg = prim.Prim(x, y_death, threshold=0.8, peel_alpha=0.1)
box_death1 = prim_alg.find_box()
[MainProcess/INFO] 40000 points remaining, containing 27619 cases of interest
[MainProcess/INFO] mean: 1.0, mass: 0.0984, coverage: 0.14251059053550091, density: 1.0 restricted_dimensions: 12
In [76]:
# Plot trade off graph between density and coverage of boxes

box_death1.show_tradeoff()
box_death1.inspect(style='graph')

plt.show
Out[76]:
<function matplotlib.pyplot.show(close=None, block=None)>
In [77]:
# Plot boxes

box_death1.show_pairs_scatter()
plt.show()

Total Costs

In [78]:
# Perform PRIM algorithm using a 0.8 threshold. This is the density threshold that a box has to meet.

prim_alg = prim.Prim(x, y_totalcosts, threshold=0.8, peel_alpha=0.1)
box_totalcosts1 = prim_alg.find_box()
[MainProcess/INFO] 40000 points remaining, containing 26125 cases of interest
[MainProcess/INFO] mean: 1.0, mass: 0.127575, coverage: 0.19533014354066985, density: 1.0 restricted_dimensions: 13
In [79]:
# Plot trade off graph between density and coverage of boxes

box_totalcosts1.show_tradeoff()
box_totalcosts1.inspect(style='graph')

plt.show
Out[79]:
<function matplotlib.pyplot.show(close=None, block=None)>
In [80]:
# Plot boxes

box_totalcosts1.show_pairs_scatter()
plt.show()

Total RfR Costs

In [81]:
# Perform PRIM algorithm using a 0.8 threshold. This is the density threshold that a box has to meet.

prim_alg = prim.Prim(x, y_rfrcosts, threshold=0.8, peel_alpha=0.1)
box_rfr1 = prim_alg.find_box()
[MainProcess/INFO] 40000 points remaining, containing 29200 cases of interest
[MainProcess/INFO] mean: 1.0, mass: 0.34, coverage: 0.4657534246575342, density: 1.0 restricted_dimensions: 4
In [82]:
# Plot trade off graph between density and coverage of boxes

box_rfr1.show_tradeoff()
box_rfr1.inspect(style='graph')

plt.show
Out[82]:
<function matplotlib.pyplot.show(close=None, block=None)>
In [83]:
# Plot boxes

box_rfr1.show_pairs_scatter()
plt.show()

Evacuation Costs

In [84]:
# Perform PRIM algorithm using a 0.8 threshold. This is the density threshold that a box has to meet.

prim_alg_evaccosts = prim.Prim(x, y_evaccosts, threshold=0.8, peel_alpha=0.1)
box_evaccosts1 = prim_alg_evaccosts.find_box()
[MainProcess/INFO] 40000 points remaining, containing 39940 cases of interest
[MainProcess/INFO] mean: 1.0, mass: 0.81, coverage: 0.8112168252378568, density: 1.0 restricted_dimensions: 2
In [86]:
# Plot trade off graph between density and coverage of boxes

box_evaccosts1.show_tradeoff()
box_evaccosts1.inspect(style='graph')

plt.show
Out[86]:
<function matplotlib.pyplot.show(close=None, block=None)>
In [87]:
# Plot boxes

box_evaccosts1.show_pairs_scatter()
plt.show()

Worst Scenario

In [27]:
# Set new thresholds that encompass the worst oucomes, again based on wishes of Rijkswaterstaat

x = experiments.iloc[:, 1:-4]
y_death_w = outcomes['Total Deaths'] > 0.005
y_totalcosts_w = outcomes['Total Costs'] > 0.75e9
y_rfrcosts_w = outcomes['RfR Total Costs'] > 1.25e9
y_evaccosts_w = outcomes['Expected Evacuation Costs'] >= 50000

Death

In [89]:
# Perform PRIM algorithm using a 0.8 threshold. This is the density threshold that a box has to meet.

prim_alg = prim.Prim(x, y_death_w, threshold=0.8, peel_alpha=0.1)
box_death1 = prim_alg.find_box()
[MainProcess/INFO] 40000 points remaining, containing 12381 cases of interest
[MainProcess/INFO] mean: 0.8657407407407407, mass: 0.054, coverage: 0.15103788062353607, density: 0.8657407407407407 restricted_dimensions: 11
In [90]:
# Plot trade off graph between density and coverage of boxes

box_death1.show_tradeoff()
box_death1.inspect(style='graph')

plt.show
Out[90]:
<function matplotlib.pyplot.show(close=None, block=None)>
In [91]:
# Plot boxes

box_death1.show_pairs_scatter()
plt.show()

Total Costs

In [92]:
# Perform PRIM algorithm using a 0.8 threshold. This is the density threshold that a box has to meet.

prim_alg = prim.Prim(x, y_totalcosts_w, threshold=0.8, peel_alpha=0.1)
box_totalcosts1 = prim_alg.find_box()
[MainProcess/INFO] 40000 points remaining, containing 13875 cases of interest
[MainProcess/INFO] mean: 1.0, mass: 0.12, coverage: 0.34594594594594597, density: 1.0 restricted_dimensions: 9
In [93]:
# Plot trade off graph between density and coverage of boxes

box_totalcosts1.show_tradeoff()
box_totalcosts1.inspect(style='graph')

plt.show
Out[93]:
<function matplotlib.pyplot.show(close=None, block=None)>
In [94]:
# Plot boxes

box_totalcosts1.show_pairs_scatter()
plt.show()

RfR Costs

In [96]:
# Perform PRIM algorithm using a 0.8 threshold. This is the density threshold that a box has to meet.

prim_alg = prim.Prim(x, y_rfrcosts_w, threshold=0.8, peel_alpha=0.1)
box_rfr1 = prim_alg.find_box()
[MainProcess/INFO] 40000 points remaining, containing 10800 cases of interest
[MainProcess/INFO] mean: 1.0, mass: 0.12, coverage: 0.4444444444444444, density: 1.0 restricted_dimensions: 4
In [97]:
# Plot trade off graph between density and coverage of boxes

box_rfr1.show_tradeoff()
box_rfr1.inspect(style='graph')

plt.show
Out[97]:
<function matplotlib.pyplot.show(close=None, block=None)>
In [98]:
# Plot boxes

box_rfr1.show_pairs_scatter()
plt.show()

Evacuation Costs

In [102]:
y_evaccosts_w
Out[102]:
0        False
1        False
2        False
3        False
4        False
         ...  
39995    False
39996    False
39997    False
39998    False
39999    False
Name: Expected Evacuation Costs, Length: 40000, dtype: bool
In [28]:
# Perform PRIM algorithm using a 0.8 threshold. This is the density threshold that a box has to meet.
# Only 60 cases of 0.1% are of interest. The othe cases do not fit within the treshold.

prim_alg_evaccosts = prim.Prim(x, y_evaccosts_w, threshold=0.8, peel_alpha=0.1)
box_evaccosts1 = prim_alg_evaccosts.find_box()
[MainProcess/INFO] 40000 points remaining, containing 60 cases of interest
[MainProcess/INFO] box does not meet threshold criteria, value is 0.014375, returning dump box
In [29]:
# Plotting boxes gives empty box

box_evaccosts1.show_tradeoff()
box_evaccosts1.inspect(style='graph')

plt.show
C:\Users\salma\miniconda3\envs\gds\lib\site-packages\ema_workbench\analysis\scenario_discovery_util.py:456: UserWarning: Attempting to set identical bottom == top == -0.5 results in singular transformations; automatically expanding.
  ax.set_ylim(top=-0.5, bottom=nr_unc - 0.5)
Out[29]:
<function matplotlib.pyplot.show(close=None, block=None)>
In [31]:
# Running the following code gives the errosr: 'ValueError: No variables found for grid columns.'
# This is due to the fact that ~99.9% of results are within the treshold of Rijkswaterstaat

#box_evaccosts1.show_pairs_scatter()
#plt.show()

Wihtout Policies (base case)

In [13]:
# Open file with outcomes and experiments from the base case

outcomes_bc = pd.read_csv('./open_exploration/outcomes_basecase.csv', index_col=None)

experiments_bc = pd.read_csv('./open_exploration/experiments_basecase.csv', index_col=None)
In [15]:
# Create a new column that combines all total costs/total deaths from all locations

outcomes_bc['Total Costs'] = outcomes_bc['A.1 Total Costs'] + outcomes_bc['A.2 Total Costs'] + outcomes_bc['A.3 Total Costs'] + outcomes_bc['A.4 Total Costs'] + outcomes_bc['A.5 Total Costs']
outcomes_bc['Total Deaths'] = outcomes_bc['A.1_Expected Number of Deaths'] + outcomes_bc['A.2_Expected Number of Deaths'] + outcomes_bc['A.3_Expected Number of Deaths'] + outcomes_bc['A.4_Expected Number of Deaths'] + outcomes_bc['A.5_Expected Number of Deaths']
In [63]:
# Set thresholds as input for PRIM
# Thresholds are still based on wishes of Rijkswaterstaat
# Since there are no policies, evacuation costs and RfR costs are 0 in every scenario. These variables will be disregarded. 

x_bc = experiments_bc.iloc[:, 1:-4]
y_death_bc = outcomes_bc['Total Deaths'] < 0.005
y_totalcosts_bc = outcomes_bc['Total Costs'] < 0.75e9
In [64]:
# Expected deaths will also be disregarded as apparantly 0 results note expected deaths under 0.005

y_death_bc.sum()
Out[64]:
0

Total Costs

In [35]:
# Perform PRIM algorithm using a 0.8 threshold. This is the density threshold that a box has to meet.

prim_alg = prim.Prim(x_bc, y_totalcosts_bc, threshold=0.8, peel_alpha=0.1)
box_totalcosts1 = prim_alg.find_box()
[MainProcess/INFO] 1000 points remaining, containing 324 cases of interest
[MainProcess/INFO] mean: 1.0, mass: 0.203, coverage: 0.6265432098765432, density: 1.0 restricted_dimensions: 4
In [36]:
# Plot trade off graph between density and coverage of boxes

box_totalcosts1.show_tradeoff()
box_totalcosts1.inspect(style='graph')

plt.show
Out[36]:
<function matplotlib.pyplot.show(close=None, block=None)>
In [37]:
# Plot boxes

box_totalcosts1.show_pairs_scatter()
plt.show()
In [ ]: